Simultaneous Clustering of Multiple Gene Expression and Physical Interaction Datasets

نویسندگان

  • Manikandan Narayanan
  • Adrian Vetta
  • Eric E. Schadt
  • Jun Zhu
چکیده

Many genome-wide datasets are routinely generated to study different aspects of biological systems, but integrating them to obtain a coherent view of the underlying biology remains a challenge. We propose simultaneous clustering of multiple networks as a framework to integrate large-scale datasets on the interactions among and activities of cellular components. Specifically, we develop an algorithm JointCluster that finds sets of genes that cluster well in multiple networks of interest, such as coexpression networks summarizing correlations among the expression profiles of genes and physical networks describing protein-protein and protein-DNA interactions among genes or gene-products. Our algorithm provides an efficient solution to a well-defined problem of jointly clustering networks, using techniques that permit certain theoretical guarantees on the quality of the detected clustering relative to the optimal clustering. These guarantees coupled with an effective scaling heuristic and the flexibility to handle multiple heterogeneous networks make our method JointCluster an advance over earlier approaches. Simulation results showed JointCluster to be more robust than alternate methods in recovering clusters implanted in networks with high false positive rates. In systematic evaluation of JointCluster and some earlier approaches for combined analysis of the yeast physical network and two gene expression datasets under glucose and ethanol growth conditions, JointCluster discovers clusters that are more consistently enriched for various reference classes capturing different aspects of yeast biology or yield better coverage of the analysed genes. These robust clusters, which are supported across multiple genomic datasets and diverse reference classes, agree with known biology of yeast under these growth conditions, elucidate the genetic control of coordinated transcription, and enable functional predictions for a number of uncharacterized genes.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Modification of the Fast Global K-means Using a Fuzzy Relation with Application in Microarray Data Analysis

Recognizing genes with distinctive expression levels can help in prevention, diagnosis and treatment of the diseases at the genomic level. In this paper, fast Global k-means (fast GKM) is developed for clustering the gene expression datasets. Fast GKM is a significant improvement of the k-means clustering method. It is an incremental clustering method which starts with one cluster. Iteratively ...

متن کامل

The Simultaneous Effect of Aerobic Training and Octopamine on the Mitophagy of Brown Adipose Tissue after Induction of Intoxication through Deeply Heated Oils in Male Wistar Rats

Introduction: Deep heated oils produce toxins that endanger people's health. The current study aims to investigate the simultaneous effect of aerobic training and Octopamine consumption on the mitophagy of brown adipose tissue after the induction of intoxication through deeply heated oil in male wistar rats. Method: 40 male wistar rats after four weeks of feedi...

متن کامل

The Effect of Drought Stresses, Fusarium Culmorum and Heterodera Filipjevi and their Interactions on the Expression Pattern of Transcription Factor Gene NAC69-3 in Bread Wheat

SExtended Abstract Introduction and Objective: Small grain cereals such as wheat, are affected by types of destructive environmental factors such as abiotic and biotic stresses that severely reduce crop yields. To cope with these conditions, transcription factors cause plant resistance to these stresses by activating or suppressing the expression of genes involved in the resistance responses....

متن کامل

Collective Analysis of Multiple High - Throughput Gene Expression Datasets

Modern technologies have resulted in the production of numerous high-throughput biological datasets. However, the pace of development of capable computational methods does not cope with the pace of generation of new high-throughput datasets. Amongst the most popular biological high-throughput datasets are gene expression datasets (e.g. microarray datasets). This work targets this aspect by prop...

متن کامل

Mammalian Eye Gene Expression Using Support Vector Regression to Evaluate a Strategy for Detecting Human Eye Disease

Background and purpose: Machine learning is a class of modern and strong tools that can solve many important problems that nowadays humans may be faced with. Support vector regression (SVR) is a way to build a regression model which is an incredible member of the machine learning family. SVR has been proven to be an effective tool in real-value function estimation. As a supervised-learning appr...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره 6  شماره 

صفحات  -

تاریخ انتشار 2010